Combining Data Integration and IE Techniques to Support Partially Structured Data
نویسندگان
چکیده
A class of applications exists where the information to be stored is partially structured: that is, it consists partly of some structured data sources each conforming to a schema and partly of information left as free text. While investigating the requirements for querying partially structured data, we have encountered several limitations in the currently available approaches and we describe here three new techniques which combine aspects of Information Extraction with data integration in order to better exploit the data in these applications.
منابع مشابه
Combining data integration and information extraction
Abstract Improving the ability of computer systems to process text is a significant research challenge. Many applications are based on partially structured databases, where structured data conforming to a schema is combined with free text. Information is stored as text in these applications because the queries requiredImproving the ability of computer systems to process text is a significant re...
متن کاملCombining information extraction and data integration in the estest system
We describe an approach which builds on techniques from Data Integration and Information Extraction in order to make better use of the unstructured data found in application domains such as the Semantic Web which require the integration of information from structured data sources, ontologies and text. We describe the design and implementation of the ESTEST system which integrates available stru...
متن کاملCombining Database and Information Extraction Techniques to Discover Structure From Partially Structured Data
This paper shows how Information Extraction and Semantic Web Ontology technologies can be combined with information integration techniques in the AutoMed framework to extend the facilities provided by databases for handling free text data. This paper gives a design for a demonstrator system ESTEST (Experimental Software to Extract Structure from Text). This design has several novel features. In...
متن کاملDetecting Diseases in Medical Prescriptions Using Data Mining Tools and Combining Techniques
Data about the prevalence of communicable and non-communicable diseases, as one of the most important categories of epidemiological data, is used for interpreting health status of communities. This study aims to calculate the prevalence of outpatient diseases through the characterization of outpatient prescriptions. The data used in this study is collected from 1412 prescriptions for various ty...
متن کاملGraph-Based Weakly-Supervised Methods for Information Extraction & Integration
The variety and complexity of potentially-related data resources available for querying --webpages, databases, data warehouses --has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of resea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008